In this project, you'll define and train a Generative Adverserial network of your own creation on a dataset of faces. Your goal is to get a generator network to generate new images of faces that look as realistic as possible!
The project will be broken down into a series of tasks from defining new architectures training adversarial networks. At the end of the notebook, you'll be able to visualize the results of your trained Generator to see how it performs; your generated samples should look like fairly realistic faces with small amounts of noise.
You'll be using the CelebFaces Attributes Dataset (CelebA) to train your adversarial networks.
This dataset has higher resolution images than datasets you have previously worked with (like MNIST or SVHN) you've been working with, and so, you should prepare to define deeper networks and train them for a longer time to get good results. It is suggested that you utilize a GPU for training.
Since the project's main focus is on building the GANs, we've done some of the pre-processing for you. Each of the CelebA images has been cropped to remove parts of the image that don't include a face, then resized down to 64x64x3 NumPy images. Some sample data is show below.

If you are working locally, you can download this data by clicking here
This is a zip file that you'll need to extract in the home directory of this notebook for further loading and processing. After extracting the data, you should be left with a directory of data processed-celeba-small/.
# run this once to unzip the file
# !unzip processed-celeba-small.zip
import warnings
warnings.filterwarnings('ignore')
import matplotlib.pyplot as plt
import numpy as np
import tests
import random
from datetime import datetime
from glob import glob
from PIL import Image
from typing import Tuple, Callable, Dict
import torch
import torch.optim as optim
import torch.nn as nn
from torch.nn import Module
from torch.utils.data import DataLoader, Dataset
from torchvision.transforms import Compose, Resize, ToTensor, Normalize
data_dir = 'processed_celeba_small/celeba/'
The CelebA dataset contains over 200,000 celebrity images with annotations. Since you're going to be generating faces, you won't need the annotations, you'll only need the images. Note that these are color images with 3 color channels (RGB)#RGB_Images) each.
Since the project's main focus is on building the GANs, we've done some of the pre-processing for you. Each of the CelebA images has been cropped to remove parts of the image that don't include a face, then resized down to 64x64x3 NumPy images. This pre-processed dataset is a smaller subset of the very large CelebA dataset and contains roughly 30,000 images.
Your first task consists in building the dataloader. To do so, you need to do the following:
The get_transforms function should output a torchvision.transforms.Compose of different transformations. You have two constraints:
def get_transforms(size: Tuple[int, int]) -> Callable:
""" Transforms to apply to the image."""
transforms = [Resize(size), ToTensor(), Normalize(mean=(0.5, 0.5, 0.5), std=(0.5, 0.5, 0.5))]
return Compose(transforms)
The DatasetDirectory class is a torch Dataset that reads from the above data directory. The __getitem__ method should output a transformed tensor and the __len__ method should output the number of files in our dataset. You can look at this custom dataset for ideas.
class DatasetDirectory(Dataset):
"""
A custom dataset class that loads images from folder.
args:
- directory: location of the images
- transform: transform function to apply to the images
- extension: file format
"""
def __init__(self, directory: str, transforms: Callable = None, extension: str = '.jpg'):
# implement the init method
self.directory = directory
self.transforms = transforms
self.extension = extension
str_wcard = self.directory + '*' + self.extension
self.str_wcard = str_wcard
def __len__(self) -> int:
""" returns the number of items in the dataset """
# return the number of elements in the dataset
num_images = len(glob(self.str_wcard))
return num_images
def __getitem__(self, index: int) -> torch.Tensor:
""" load an image and apply transformation """
# return the index-element of the dataset after applying transformation
file_list = glob(self.str_wcard)
img = Image.open(file_list[index])
return self.transforms(img)
"""
DO NOT MODIFY ANYTHING IN THIS CELL
"""
# run this cell to verify your dataset implementation
size = (64, 64)
dataset = DatasetDirectory(data_dir, get_transforms(size))
print("Dataset length = {}".format(len(dataset)))
tests.check_dataset_outputs(dataset)
num_test_images = 10
for i in range(num_test_images):
idx = random.randrange(len(dataset))
img = dataset[idx]
print("image#: {} | Min: {} | Max: {}".format(idx, img.min(), img.max()))
The functions below will help you visualize images from the dataset.
"""
DO NOT MODIFY ANYTHING IN THIS CELL
"""
def denormalize(images):
"""Transform images from [-1.0, 1.0] to [0, 255] and cast them to uint8."""
return ((images + 1.) / 2. * 255).astype(np.uint8)
# plot the images in the batch, along with the corresponding labels
fig = plt.figure(figsize=(20, 4))
plot_size=20
for idx in np.arange(plot_size):
ax = fig.add_subplot(2, int(plot_size/2), idx+1, xticks=[], yticks=[])
img = dataset[idx].numpy()
img = np.transpose(img, (1, 2, 0))
img = denormalize(img)
ax.imshow(img)
As you know, a GAN is comprised of two adversarial networks, a discriminator and a generator. Now that we have a working data pipeline, we need to implement the discriminator and the generator.
Feel free to implement any additional class or function.
The discriminator's job is to score real and fake images. You have two constraints here:
Feel free to get inspiration from the different architectures we talked about in the course, such as DCGAN, WGAN-GP or DRAGAN.
Conv2d layers with the correct hyperparameters or Pooling layers.def ConvBlock(in_channels: int, out_channels: int, in_spatial_dim: Tuple[int, int], kernel_size: int, batch_norm: bool = True, layer_norm: bool = False):
"""
A convolutional block is made of 3 layers: Conv -> BatchNorm -> Activation.
args:
- in_channels: number of channels in the input to the conv layer
- out_channels: number of filters in the conv layer
- in_spatial_dim: size of input image
- kernel_size: filter dimension of the conv layer
- batch_norm: whether to use batch norm or not
- layer_norm: whether to use layer norm or not
"""
layers = []
conv = nn.Conv2d(in_channels, out_channels, kernel_size, stride=2, padding=1, bias=False)
layers.append(conv)
if batch_norm:
bn = nn.BatchNorm2d(out_channels)
layers.append(bn)
if layer_norm:
# out_spatial_dim = ((in_spatial_dim + 2*padding - kernel_size) // stride) + 1
out_spatial_dim = ((np.asarray(in_spatial_dim) + 2 - kernel_size) // 2) + 1
ln = nn.LayerNorm([out_channels, out_spatial_dim[0], out_spatial_dim[1]], bias = False)
layers.append(ln)
activation = nn.LeakyReLU(0.2)
layers.append(activation)
return nn.Sequential(*layers)
class Discriminator(Module):
def __init__(self, conv_dim: int = 64, size: Tuple[int, int] = (64,64), use_batch_norm: bool = True, use_layer_norm: bool = False):
super(Discriminator, self).__init__()
self.conv1 = ConvBlock(in_channels=3, out_channels=conv_dim, in_spatial_dim=size, kernel_size=4, batch_norm=False, layer_norm=False) # 3x64x64 --> conv_dimx32x32. First layer, no batch_norm or layer_norm.
self.conv2 = ConvBlock(in_channels=conv_dim, out_channels=conv_dim*2, in_spatial_dim=tuple(np.divide(size,2).astype(int)), kernel_size=4, batch_norm=use_batch_norm, layer_norm=use_layer_norm) # conv_dimx32x32 --> (conv_dim*2)x16x16
self.conv3 = ConvBlock(in_channels=conv_dim*2, out_channels=conv_dim*4, in_spatial_dim=tuple(np.divide(size,4).astype(int)), kernel_size=4, batch_norm=use_batch_norm, layer_norm=use_layer_norm) # (conv_dim*2)x16x16 --> (conv_dim*4)x8x8
self.conv4 = ConvBlock(in_channels=conv_dim*4, out_channels=conv_dim*8, in_spatial_dim=tuple(np.divide(size,8).astype(int)), kernel_size=4, batch_norm=use_batch_norm, layer_norm=use_layer_norm) # (conv_dim*4)x8x8 --> (conv_dim*8)x4x4
self.flatten = nn.Flatten()
self.fc = nn.Linear(conv_dim*8*4*4, 1) # final, fully-connected layer
def forward(self, x):
x = self.conv1(x)
x = self.conv2(x)
x = self.conv3(x)
x = self.conv4(x)
x = self.flatten(x)
x = self.fc(x)
return x
device = torch.device("cuda" if torch.cuda.is_available() else "cpu") # Use GPU if available
print("Device = {}".format(device))
"""
DO NOT MODIFY ANYTHING IN THIS CELL
"""
# run this cell to check your discriminator implementation
discriminator = Discriminator(conv_dim=64, size=size, use_batch_norm=True, use_layer_norm=False).to(device)
print(discriminator)
tests.check_discriminator(discriminator)
The generator's job creates the "fake images" and learns the dataset distribution. You have three constraints here:
[batch_size, latent_dimension, 1, 1]Feel free to get inspiration from the different architectures we talked about in the course, such as DCGAN, WGAN-GP or DRAGAN.
ConvTranspose2d layersdef DeconvBlock(in_channels: int, out_channels: int, in_spatial_dim: Tuple[int, int], kernel_size: int, stride: int, padding: int, batch_norm: bool = True, layer_norm: bool = False):
"""
A "de-convolutional" block is made of 3 layers: ConvTranspose -> BatchNorm -> Activation.
args:
- in_channels: number of channels in the input to the conv layer
- out_channels: number of filters in the conv layer
- in_spatial_dim: size of input image
- kernel_size: filter dimension of the conv layer
- stride: stride of the conv layer
- padding: padding of the conv layer
- batch_norm: whether to use batch norm or not
- layer_norm: whether to use layer norm or not
"""
layers = []
deconv = nn.ConvTranspose2d(in_channels, out_channels, kernel_size, stride=stride, padding=padding, bias=False)
layers.append(deconv)
if batch_norm:
bn = nn.BatchNorm2d(out_channels)
layers.append(bn)
if layer_norm:
out_spatial_dim = (np.asarray(in_spatial_dim) - 1)*stride - 2*padding + kernel_size
ln = nn.LayerNorm([out_channels, out_spatial_dim[0], out_spatial_dim[1]], bias = False)
layers.append(ln)
return nn.Sequential(*layers)
class Generator(Module):
def __init__(self, latent_dim: int = 128, conv_dim: int = 64, use_batch_norm: bool = True, use_layer_norm: bool = False):
super(Generator, self).__init__()
self.conv_dim = conv_dim
self.fc = nn.Linear(latent_dim, conv_dim*8*4*4) # conv_dim*8*4*4: same as final output size of discriminator
in_spatial_dim_init = (4,4)
self.deconv1 = DeconvBlock(in_channels=conv_dim*8, out_channels=conv_dim*4, in_spatial_dim=in_spatial_dim_init, kernel_size=4, stride=2, padding=1, batch_norm=use_batch_norm, layer_norm=use_layer_norm) # (conv_dim*8)x4x4 --> (conv_dim*4)x8x8
self.deconv2 = DeconvBlock(in_channels=conv_dim*4, out_channels=conv_dim*2, in_spatial_dim=tuple(np.multiply(in_spatial_dim_init,2).astype(int)), kernel_size=4, stride=2, padding=1, batch_norm=use_batch_norm, layer_norm=use_layer_norm) # (conv_dim*4)x8x8 --> (conv_dim*2)x16x16
self.deconv3 = DeconvBlock(in_channels=conv_dim*2, out_channels=conv_dim, in_spatial_dim=tuple(np.multiply(in_spatial_dim_init,4).astype(int)), kernel_size=4, stride=2, padding=1, batch_norm=use_batch_norm, layer_norm=use_layer_norm) # (conv_dim*2)x16x16 --> conv_dimx32x32
self.deconv4 = DeconvBlock(in_channels=conv_dim, out_channels=3, in_spatial_dim=tuple(np.multiply(in_spatial_dim_init,8).astype(int)), kernel_size=4, stride=2, padding=1, batch_norm=False, layer_norm=False) # conv_dimx32x32 --> 3x64x64. Last layer, no batch_norm or layer_norm.
self.activation = nn.ReLU()
self.last_activation = nn.Tanh()
def forward(self, x):
x = self.fc(x)
x = x.view(-1, self.conv_dim*8, 4, 4) # re-shape as (batch_size, depth, 4, 4)
x = self.activation(self.deconv1(x))
x = self.activation(self.deconv2(x))
x = self.activation(self.deconv3(x))
x = self.last_activation(self.deconv4(x))
return x
"""
DO NOT MODIFY ANYTHING IN THIS CELL
"""
# run this cell to verify your generator implementation
generator = Generator(latent_dim=128, conv_dim=64, use_batch_norm=True, use_layer_norm=False).to(device)
print(generator)
tests.check_generator(generator, latent_dim=128)
In the following section, we create the optimizers for the generator and discriminator. You may want to experiment with different optimizers, learning rates and other hyperparameters as they tend to impact the output quality.
# Optimizer parameters
learning_rate_gen = 0.0002
learning_rate_disc = 0.0002
beta1 = 0.5 # reduce from 0.9 for GAN stability
beta2 = 0.999 # default
def create_optimizers(generator: Module, discriminator: Module):
""" This function should return the optimizers of the generator and the discriminator """
g_optimizer = optim.Adam(generator.parameters(), learning_rate_gen, [beta1, beta2])
d_optimizer = optim.Adam(discriminator.parameters(), learning_rate_disc, [beta1, beta2])
return g_optimizer, d_optimizer
In this section, we are going to implement the loss function for the generator and the discriminator. You can and should experiment with different loss function.
Some tips:
def logits_loss(logits, real_or_fake='real'):
batch_size = logits.size(0)
if(real_or_fake == 'real'):
labels = torch.ones(batch_size) * 0.9 # smooth real labels by 0.9
elif(real_or_fake == 'fake'):
labels = torch.zeros(batch_size) # fake labels = 0
labels = labels.to(device)
criterion = nn.BCEWithLogitsLoss()
loss = criterion(logits.squeeze(), labels)
return loss
def wasserstein_loss(real_logits, fake_logits, module_type):
"""
Wasserstein Loss
args:
- real_logits: vector of logits outputed by the discriminator with a real input image
- fake_logits: vector of logits outputed by the discriminator with a fake input image
"""
if(module_type == 'discriminator'):
real_loss = -real_logits.mean()
fake_loss = fake_logits.mean()
total_loss = real_loss + fake_loss
elif(module_type == 'generator'):
fake_loss = -fake_logits.mean()
total_loss = fake_loss
return total_loss
The generator's goal is to get the discriminator to think its generated images (= "fake" images) are real.
def generator_loss(fake_logits, loss_type):
""" Generator loss, takes the fake scores as inputs. """
if(loss_type == 'bce'):
loss = logits_loss(fake_logits, real_or_fake='real')
elif(loss_type == 'wasserstein'):
loss = wasserstein_loss(None, fake_logits, 'generator')
return loss
We want the discriminator to give high scores to real images and low scores to fake ones and the discriminator loss should reflect that.
def discriminator_loss(real_logits, fake_logits, gp, loss_type):
""" Discriminator loss, takes the fake and real logits as inputs. """
if(loss_type == 'bce'):
loss = logits_loss(real_logits, real_or_fake='real') + logits_loss(fake_logits, real_or_fake='fake') + lambda_p*gp
elif(loss_type == 'wasserstein'):
loss = wasserstein_loss(real_logits, fake_logits, 'discriminator') + lambda_p*gp
return loss
In the course, we discussed the importance of gradient penalty in training certain types of Gans. Implementing this function is not required and depends on some of the design decision you made (discriminator architecture, loss functions).
lambda_p = 10 # lambda for GP
def gradient_penalty(real_sample, fake_sample, critic, gp_type):
"""
This function enforces 1-Lipshitz constraint of the discriminator
Gradient penalty of the WGAN-GP model
args:
- real_sample: sample from the real dataset
- fake_sample: generated sample
returns:
- gradient penalty
"""
gp = 0
if(gp_type != None):
# sample a random point between both distributions
alpha = torch.rand(real_sample.shape).to(device)
if(gp_type == 'wgangp'):
x_hat = alpha * real_sample + (1 - alpha) * fake_sample
elif(gp_type == 'dragan'):
X_p = real_sample + 0.5 * real_sample.std() * torch.rand_like(real_sample).to(device)
x_hat = alpha * real_sample + (1 - alpha) * X_p
# calculate the gradient
if(x_hat.requires_grad == False):
x_hat.requires_grad = True
pred = critic(x_hat)
grad = torch.autograd.grad(pred, x_hat, grad_outputs=torch.ones_like(pred).to(device), create_graph=True)[0]
# calculate the norm and the final penalty
norm = torch.norm(grad.view(-1), 2)
gp = ((norm - 1)**2).mean()
return gp
Training will involve alternating between training the discriminator and the generator. You'll use your functions real_loss and fake_loss to help you calculate the discriminator losses.
You don't have to implement anything here but you can experiment with different hyperparameters.
# Model parameters
latent_dim = 128 # dimensions of latent space
conv_dim = 64 # control the number of filters
loss_type = 'bce' # loss type (bce or wasserstein)
gp_type = 'dragan' # gradient penalty type (None or wgangp or dragan)
use_batch_norm = True
use_layer_norm = False
if(gp_type == 'wgangp'):
use_batch_norm = False
use_layer_norm = True
"""
DO NOT MODIFY ANYTHING IN THIS CELL
"""
# Create optimizers for discriminator and generator
generator = Generator(latent_dim=latent_dim, conv_dim=conv_dim, use_batch_norm=use_batch_norm, use_layer_norm=use_layer_norm)
discriminator = Discriminator(conv_dim=conv_dim, size=size, use_batch_norm=use_batch_norm, use_layer_norm=use_layer_norm)
print("torch.cuda.device_count() = {}".format(torch.cuda.device_count()))
if (device == "cuda" and torch.cuda.device_count() > 1):
generator = nn.DataParallel(generator)
discriminator = nn.DataParallel(discriminator)
generator = generator.to(device)
discriminator = discriminator.to(device)
g_optimizer, d_optimizer = create_optimizers(generator, discriminator)
Each function should do the following:
def generator_step(batch_size: int, latent_dim: int, loss_type: str) -> Dict:
""" One training step of the generator. """
# Generator step (foward pass, loss calculation and backward pass)
g_optimizer.zero_grad()
# Generate fake images
z = np.random.uniform(-1, 1, size=(batch_size, latent_dim))
z = torch.from_numpy(z).float().to(device)
fake_images = generator(z).to(device)
fake_logits = discriminator(fake_images) # Train with fake images and flipped labels (forward pass)
g_loss = generator_loss(fake_logits, loss_type) # loss calculation
g_loss.backward() # backward pass
g_optimizer.step()
return {'loss': g_loss}
def discriminator_step(batch_size: int, latent_dim: int, real_images: torch.Tensor, loss_type: str, gp_type: str) -> Dict:
""" One training step of the discriminator. """
# Discriminator step (foward pass, loss calculation and backward pass)
d_optimizer.zero_grad()
real_logits = discriminator(real_images) #1. Train with real images (forward pass)
# Generate fake images
z = np.random.uniform(-1, 1, size=(batch_size, latent_dim))
z = torch.from_numpy(z).float().to(device)
fake_images = generator(z).to(device)
fake_logits = discriminator(fake_images) #2. Train with fake images (forward pass)
gp = gradient_penalty(real_images, fake_images, discriminator, gp_type) # gradient penalty
d_loss = discriminator_loss(real_logits, fake_logits, gp, loss_type) # loss calculation
d_loss.backward() # backward pass
d_optimizer.step()
return {'loss': d_loss, 'gp': gp}
"""
DO NOT MODIFY ANYTHING IN THIS CELL
"""
def display(fixed_latent_vector: torch.Tensor):
""" helper function to display images during training """
fig = plt.figure(figsize=(14, 4))
plot_size = 16
for idx in np.arange(plot_size):
ax = fig.add_subplot(2, int(plot_size/2), idx+1, xticks=[], yticks=[])
img = fixed_latent_vector[idx, ...].detach().cpu().numpy()
img = np.transpose(img, (1, 2, 0))
img = denormalize(img)
ax.imshow(img)
plt.show()
You should experiment with different training strategies. For example:
Implement with your training strategy below.
# Training parameters
batch_size = 64 # number of images in each batch
num_workers = 4
sample_size = 16
n_epochs = 10 # number of epochs
print_every = 50 # for printing progress
# Set one of the following to a value >= 1 and the other one, to 1
n_gen_batch_per_disc_batch = 1 # set more than 1 to train generator more than discriminator
n_disc_batch_per_gen_batch = 1 # set more than 1 to train discriminator more than generator
# Create dataloader
dataloader = DataLoader(dataset, batch_size=64, shuffle=True, num_workers=0, drop_last=True, pin_memory=False)
fixed_latent_vector = torch.randn(sample_size, latent_dim).float().to(device)
losses = []
for epoch in range(n_epochs):
for batch_i, real_images in enumerate(dataloader):
real_images = real_images.to(device)
batch_size = real_images.size(0)
if(batch_i % n_gen_batch_per_disc_batch == 0):
d_loss = discriminator_step(batch_size, latent_dim, real_images, loss_type, gp_type) # train the discriminator
if(batch_i % n_disc_batch_per_gen_batch == 0):
g_loss = generator_step(batch_size, latent_dim, loss_type) # train the generator
if batch_i % print_every == 0:
# append discriminator loss and generator loss
d = d_loss['loss'].item()
g = g_loss['loss'].item()
losses.append((d, g))
gp = d_loss['gp'].item()
# print discriminator and generator loss
time = str(datetime.now()).split('.')[0]
print("{} | Epoch [{}/{}] | Batch {}/{} | lambda x gp: {} | d_loss: {} | g_loss: {}".format(time, epoch+1, n_epochs, batch_i, len(dataloader), round(lambda_p*gp, 4), round(d,4), round(g,4)))
# display images during training
generator.eval()
generated_images = generator(fixed_latent_vector)
display(generated_images)
generator.train()
Plot the training losses for the generator and discriminator.
"""
DO NOT MODIFY ANYTHING IN THIS CELL
"""
fig, ax = plt.subplots()
losses = np.array(losses)
plt.plot(losses.T[0,1:], label='Discriminator', alpha=0.5)
plt.plot(losses.T[1,1:], label='Generator', alpha=0.5)
plt.title("Training Losses")
plt.legend()
When you answer this question, consider the following factors:
Answer:
When submitting this project, make sure to run all the cells before saving the notebook. Save the notebook file as "dlnd_face_generation.ipynb".
Submit the notebook using the SUBMIT button in the bottom right corner of the Project Workspace.